Extraction of protein interaction information from unstructured text using a context-free grammar

نویسندگان

  • Joshua M. Temkin
  • Mark R. Gilder
چکیده

MOTIVATION As research into disease pathology and cellular function continues to generate vast amounts of data pertaining to protein, gene and small molecule (PGSM) interactions, there exists a critical need to capture these results in structured formats allowing for computational analysis. Although many efforts have been made to create databases that store this information in computer readable form, populating these sources largely requires a manual process of interpreting and extracting interaction relationships from the biological research literature. Being able to efficiently and accurately automate the extraction of interactions from unstructured text, would greatly improve the content of these databases and provide a method for managing the continued growth of new literature being published. RESULTS In this paper, we describe a system for extracting PGSM interactions from unstructured text. By utilizing a lexical analyzer and context free grammar (CFG), we demonstrate that efficient parsers can be constructed for extracting these relationships from natural language with high rates of recall and precision. Our results show that this technique achieved a recall rate of 83.5% and a precision rate of 93.1% for recognizing PGSM names and a recall rate of 63.9% and a precision rate of 70.2% for extracting interactions between these entities. In contrast to other published techniques, the use of a CFG significantly reduces the complexities of natural language processing by focusing on domain specific structure as opposed to analyzing the semantics of a given language. Additionally, our approach provides a level of abstraction for adding new rules for extracting other types of biological relationships beyond PGSM relationships. AVAILABILITY The program and corpus are available by request from the authors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information Extraction using Context-free Grammatical Inference from Positive Examples

Information extraction from textual data has various applications, such as semantic search. Learning from positive example have theoretical limitations, for many useful applications (including natural languages), substantial part of practical structure (CFG) can be captured by framework introduced in this paper. Our approach to automate identification of structural information is based on gramm...

متن کامل

Context-free Grammar Learning from Text Document using Sequential Pattern

The World-Wide-Web and information system has gained significant achievements over the last two decades as expressed their dominance in various business and scientific applications. As estimated by Blumberg and Atre more than 85% of all business information exists in the form of unstructured and semi-structured document, typically formatted for human viewing, not for system processing. Extracti...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

The Interaction of Gender with Text Enhancement and Meta-cognitive Grammar Instruction on Learning and Recall of English Grammar

The current research was an effort to study the interaction of gender with text enhancement and meta-cognitive grammar instruction on learning and recall of English grammar. To this end, two groups of students consisting of 51 learners from both genders were formed. The participants were 51 male and 51 female learners. The 51 participants of each gender were further divided into two groups. The...

متن کامل

The Interaction of Gender with Text Enhancement and Meta-cognitive Grammar Instruction on Learning and Recall of English Grammar

The current research was an effort to study the interaction of gender with text enhancement and meta-cognitive grammar instruction on learning and recall of English grammar. To this end, two groups of students consisting of 51 learners from both genders were formed. The participants were 51 male and 51 female learners. The 51 participants of each gender were further divided into two groups. The...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 19 16  شماره 

صفحات  -

تاریخ انتشار 2003